Show the code
import pandas as pd
import numpy as np
from lets_plot import *
LetsPlot.setup_html(isolated_frame=True)Course DS 250
Aidan Pfunder
For Project 1 the answer to each question should include a chart and a written response. The years labels on your charts should not include a comma. At least two of your charts must include reference marks.
How does your name at your birth year compare to its use historically?
My name “Aidan” was not used very often before my birth year of 2001, but shot up in popularity in the early 2000s.
If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess?
If I talked to someone named Brittany on the phone, I would guess she was born in 1990, making her 35 years old. Based on the graph, I would not guess that she is older than 45 or younger than 25.
Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names in a single chart. What trends do you notice?
Mary was the most popular of the 4 names. All 4 names dropped in popularity over time, but Mary dropped the most.
names = df[df["name"].isin(["Mary", "Martha", "Peter", "Paul"])]
names = names[(names["year"] >= 1920) & (names["year"] <= 2000)]
(ggplot(names, aes(x="year", y="Total", color="name")) + \
geom_line(size=1.2) + \
scale_x_continuous(format="d") + \
ggtitle("Trends of Mary, Martha, Peter, and Pau(1920–2000)") + \
xlab("Year") + ylab("Total Names")) Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage?
This graph shows that the name “Tony” was at its’ most popular around 1960, with a sharp decline afterward. I thought the relase of Iron Man in 2008 would have some effect; however, it seems it changed nothing and the name continued to drop in popularity after the movie.
Reproduce the chart Elliot using the data from the names_year.csv file.
The name Elliot seemed to lose popularity after the release of the first E.T. It gained popularity after the second release, fell back down, and then really started to rise in 2000 at the third release. It is difficult to deteermine if the movie releases contribute to the name’s popularity because it seems to rise and fall independent of the movie releases.
elliot = df[df["name"] == "Elliot"].copy()
elliot["label"] = "Elliot"
annotations = pd.DataFrame({
"year": [1981, 1986, 2003],
"Total": [1250, 1250, 1250],
"label": ["E.T Released", "Second", "Third"],
"hjust": [1, 0, 0]
})
plot = (
ggplot(elliot, aes(x="year", y="Total", color="label")) +
geom_line(size=1.2, alpha=0.7) +
geom_vline(xintercept=1982, color="red", linetype="dashed", size=1) +
geom_vline(xintercept=1985, color="red", linetype="dashed", size=1) +
geom_vline(xintercept=2002, color="red", linetype="dashed", size=1) +
geom_text(
data=annotations,
mapping=aes(x="year", y="Total", label="label", hjust="hjust"),
color="black", size=7
) +
scale_color_manual(values={"Elliot": "blue"}) +
scale_x_continuous(format="d") +
ggtitle("Elliot... What?") +
xlab("year") + ylab("Total")
)
plot